************************************************************************************************************
Replication Files for "How to Train Your Stochastic Parrot: Large Language Models for Political Texts"
Authors: Ornstein, Blasingame, & Truscott (2024)
Date Created: 2024-10-09
************************************************************************************************************

There are 34 files included in this repository. To reproduce all figures and tables, run the '__main.R' script.

Scripts:
- __main.R (calls the following scripts in turn, reproducing all tables and figures from the paper)
- figure1.R
- figure2.R
- figure3.R
- figure4.R
- figure-A1.R
- figure-A2.R
- figure-B1.R
- figure-B2.R

Data:
- application1-few-shot-gpt-3.RData (few-shot GPT-3 labels for Twitter sentiment application)
- application1-few-shot-gpt-4.RData (few-shot GPT-4 labels for Twitter sentiment application)
- application1-TweetNLP.csv (TweetNLP labels for Twitter sentiment application)
- application1-naive-bayes.RData (Naive Bayes labels for Twitter sentiment application)
- application2-carlson-montgomery-2017.csv (cleaned dataset from Carlson & Montgomery, 2017)
- application2-one-shot-gpt-3.RData (one-shot GPT-3 labels for political ad tone application)
- application2-one-shot-gpt-4.RData (one-shot GPT-4 labels for political ad tone application)
- application3-benoit-manifesto-estimates.csv (manifesto-level estimates transcribed from Benoit et al. 2016)
- application3-one-shot-gpt-3-policy.csv (one-shot GPT-3 sentence-level policy labels)
- application3-one-shot-gpt-3-ideology.csv (one-shot GPT-3 sentence-level ideology labels)
- application4-one-shot-gpt-3.csv (one-shot GPT-3 topic labels for Congressional speech application)
- appendix-A-*.csv (12 sets of GPT-3 labels for Twitter sentiment application, varying prompt and model variant)
- appendix-B-tweets.RData (GPT-4 estimates of Twitter sentiment for Appendix Figure B1)
- appendix-B-manifestos.RData (GPT-4 manifesto-level estimates for Appendix Figure B2)

We used the following software and software packages to produce our results:

R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

attached packages:
 [1] tidytext_0.4.2   ggrepel_0.9.6    patchwork_1.2.0  sentimentr_2.9.0
 [5] promptr_1.0.0    lubridate_1.9.3  forcats_1.0.0    stringr_1.5.1   
 [9] dplyr_1.1.4      purrr_1.0.2      readr_2.1.5      tidyr_1.3.1     
[13] tibble_3.2.1     ggplot2_3.5.1    tidyverse_2.0.0 

loaded via a namespace (and not attached):
 [1] janeaustenr_1.0.0 utf8_1.2.4        generics_0.1.3    lattice_0.22-6   
 [5] stringi_1.8.4     hms_1.1.3         magrittr_2.0.3    textshape_1.7.5  
 [9] grid_4.4.1        timechange_0.3.0  Matrix_1.7-0      syuzhet_1.0.7    
[13] mgcv_1.9-1        fansi_1.0.6       scales_1.3.0      textshaping_0.4.0
[17] cli_3.6.3         rlang_1.1.4       crayon_1.5.3      tokenizers_0.3.0 
[21] splines_4.4.1     bit64_4.5.2       munsell_0.5.1     withr_3.0.1      
[25] tools_4.4.1       parallel_4.4.1    tzdb_0.4.0        textclean_0.9.3  
[29] colorspace_2.1-1  vctrs_0.6.5       R6_2.5.1          lifecycle_1.0.4  
[33] bit_4.5.0         vroom_1.6.5       ragg_1.3.3        pkgconfig_2.0.3  
[37] pillar_1.9.0      gtable_0.3.5      Rcpp_1.0.13       data.table_1.16.0
[41] glue_1.7.0        systemfonts_1.1.0 tidyselect_1.2.1  rstudioapi_0.16.0
[45] farver_2.1.2      SnowballC_0.7.1   nlme_3.1-164      labeling_0.4.3   
[49] compiler_4.4.1    qdapRegex_0.7.8   lexicon_1.2.1  